In this Insight, I’ll cover the basics of generative adversarial networks, and then I’ll look at recent advances from the Radiant Earth Foundation, which is using generative adversarial networks to address the scarcity of earth observation training data for machine learning applications in geospatial analysis.
A generative adversarial network (GAN) has a very unique network architecture; very basically, two neural networks compete against each other in what’s called a zero-sum game: one network’s gains are the other’s losses. The first network is the generator: it takes in “noise” and tries to produce accurate fake “instances” (i.e., fake images). The second network is the discriminator, which takes in these instances and tries to classify them as either fake or real. This is where the zero-sum game being played is more apparent: as the generator or discriminator improves, the other network will find it increasingly more difficult to complete its task. Below, I have a high-level graphical representation of the GAN architecture.
Figure 1: the basic GAN architecture.
Notice that, based on the results of the discriminator, both the generator and discriminator are updated. Another perk of GANs is their versatility: GANs can be supervised, semi-supervised, or unsupervised, which allows them to be applied to a wide variety of tasks.
It has been shown that GANs perform particularly well on image-related tasks: from producing fake human faces to filling in missing portions of images, GANs manage to consistently produce impressive, realistic results (although sometimes the images produced break down a little bit on closer inspection). I’ve included some examples below of photo inpainting and a fake face produced by GANs.
Figure 2: a fake face from thispersondoesnotexist.com
Figure 3: an example of photo inpainting, taken from the image-to-image translation article (see the next section).
See this site for more applications of GANs.
An application of particular interest is image-to-image translation. The idea here is pretty self exlanatory: given a new image, the GAN will learn how to translate that image in a certain prescribed manner. This is somewhat vague, so let’s look at a few examples from the article entitled “Image-to-Image Translation with Conditional Adversarial Networks.” One of the many tasks approached in the article was the translation from an edge map to a photo-realistic image. The authors also used GANs to translate daytime images to nighttime See below for a few visual examples.
Figure 4: a daytime image translated to nighttime.
Figure 5: an edge map of a purse translated to a photo-realistic purse. Closer inspection might reveal flaws, but the translation is surprisingly accurate given how little information is encoded in the edge map.
In the article, the authors find that a standard loss function formulation manages to generalize well to a variety of tasks. They released their results in pix2pix, which is a high-level software that allows people to interact with their GANs. Implementations have also been ported over to tensorflow. You can check out more results from the article at this site and you can read the article here.
An aside: I had thought about doing this Insight on the article “Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks,” which is a super cool application of image-to-image translation. The authors use a GAN to learn the “sense” of a given city (New York, Seattle, and Beijing) by learning to translate basemap tiles of that city to fake aerial photos. Then, given basemap tiles of Corvallis (Oregon), the “sense” of the city that the GAN was trained on is transferred over. This creates hybrid fake aerial photos, where the urban style of one city is meshed with the actual urban features of another. You can check out the article here.
Ok, finally we’ve made it to the application of interest. This research was carried out by the Radiant Earth Foundation, whose mission is concerned with providing accessible and accurate earth observation training data for machine learning applications in geospatial analysis. In this project, they’ve pitted a GAN against a convolutional neural network (CNN) in a very particular task that is of immense importance in this field of study: classifying land-cover attributes in a given image. For each pixel in a satellite image, the GAN or CNN must classify that pixel as one of six distinct classes: open water, developed, forest, grassland, pasture, or cultivated.
The training/testing data used here is composed of images from Sentinel-2 at a 10 meter resolution, with class labels taken from the National Land Cover Database (collected by the USGS). The training set consisted of about 16,000 images and the test set was about 7,000 images. So, pitting a state of the art CNN against a comparable GAN (both with a similar number of trainable parameters), the folks at the Radiant Earth Foundation found that the GAN was better able to generalize to unseen data than the CNN. I’ve included some results from the article below, showing the actual land cover classes, and the classes predicted by the GAN and CNN.
Figure 6: results from the land cover classification task. The class colors are: red: developed; blue: open water; cyan: pasture; dark green: forest; light green: grass; brown: cultivated.
It’s worth emphasizing the difficulty of this task: the networks need to classify each pixel of the image. The level of accuracy achieved here blows my mind! Give the article a read for yourself. The full results will be published in a forthcoming publication, but this is a nice preview.
I have a few critiques of the work here so far, which I’m sure will be addressed in the coming publication. Mostly, I’m wondering if there will be difficulties in generalizing the model here: the GAN is trained on data from the continental US, but the techniques are most needed for the global south (predominantly LMICs). This makes sense: the data for the US, and probably most other high income countries, is already very complete and accurate. But, in lower income countries, gathering accurate “ground data” (data gathered manually on site) is often infeasible–both in terms of cost and access to regions of interest. Since land cover is constantly changing, the need for frequent updates further complicates things. We can clearly see why these techniques are important, but also see how generalizing might be particularly difficult. I’m interested to see how they account for this issue in the final article.
Ultimately, this is a very interesting area of research and the Radiant Earth Foundation is of course doing great work. Check out some of the links included throughout the Insight for further reading!